Biostat 203B Homework 1

Due Jan 26, 2024 @ 11:59PM

Author

Ziheng Zhang_606300061

Display machine information for reproducibility:

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.1.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.2 compiler_4.3.2    fastmap_1.1.1     cli_3.6.1        
 [5] tools_4.3.2       htmltools_0.5.7   rstudioapi_0.15.0 yaml_2.3.7       
 [9] rmarkdown_2.25    knitr_1.45        jsonlite_1.8.7    xfun_0.41        
[13] digest_0.6.33     rlang_1.1.2       evaluate_0.23    

Q1. Git/GitHub

No handwritten homework reports are accepted for this course. We work with Git and GitHub. Efficient and abundant use of Git, e.g., frequent and well-documented commits, is an important criterion for grading your homework.

  1. Apply for the Student Developer Pack at GitHub using your UCLA email. You’ll get GitHub Pro account for free (unlimited public and private repositories).

  2. Create a private repository biostat-203b-2024-winter and add Hua-Zhou and TA team (Tomoki-Okuno for Lec 1; jonathanhori and jasenzhang1 for Lec 80) as your collaborators with write permission.

  3. Top directories of the repository should be hw1, hw2, … Maintain two branches main and develop. The develop branch will be your main playground, the place where you develop solution (code) to homework problems and write up report. The main branch will be your presentation area. Submit your homework files (Quarto file qmd, html file converted by Quarto, all code and extra data sets to reproduce results) in the main branch.

  4. After each homework due date, course reader and instructor will check out your main branch for grading. Tag each of your homework submissions with tag names hw1, hw2, … Tagging time will be used as your submission time. That means if you tag your hw1 submission after deadline, penalty points will be deducted for late submission.

  5. After this course, you can make this repository public and use it to demonstrate your skill sets on job market.

Answer: I have finished all the steps above.

Q2. Data ethics training

This exercise (and later in this course) uses the MIMIC-IV data v2.2, a freely accessible critical care database developed by the MIT Lab for Computational Physiology. Follow the instructions at https://mimic.mit.edu/docs/gettingstarted/ to (1) complete the CITI Data or Specimens Only Research course and (2) obtain the PhysioNet credential for using the MIMIC-IV data. Display the verification links to your completion report and completion certificate here. You must complete Q2 before working on the remaining questions. (Hint: The CITI training takes a few hours and the PhysioNet credentialing takes a couple days; do not leave it to the last minute.)

Answer: I have finished the CITI training and obtained the PhysioNet credential. The verification link to my completion report is as follows: https://www.citiprogram.org/verify/?kb2b9764b-f91d-471b-a439-9448b4e5f4a2-60470489. The verification link to my completion certificate is as follows: https://www.citiprogram.org/verify/?wb9c162da-a9ca-4f72-9484-118cc2b0f9ca-60470489. The PhysioNet credential is shown as follows:

Q3. Linux Shell Commands

  1. Make the MIMIC v2.2 data available at location ~/mimic.

Answer: I have downloaded the MIMIC v2.2 data and put them in the folder ~/mimic. The data files are not put into Git. The data files are not copied into my directory. The gz data files are not decompressed. The following bash command displays the contents in the folder ~/mimic.

ls -l ~/mimic/
total 48
-rw-rw-r--@  1 zihengzhang  staff  13332 Jan  5  2023 CHANGELOG.txt
-rw-rw-r--@  1 zihengzhang  staff   2518 Jan  5  2023 LICENSE.txt
-rw-rw-r--@  1 zihengzhang  staff   2884 Jan  6  2023 SHA256SUMS.txt
drwxr-xr-x@ 24 zihengzhang  staff    768 Jan 13 12:28 hosp
drwxr-xr-x@ 11 zihengzhang  staff    352 Jan 13 12:28 icu

Refer to the documentation https://physionet.org/content/mimiciv/2.2/ for details of data files. Please, do not put these data files into Git; they are big. Do not copy them into your directory. Do not decompress the gz data files. These create unnecessary big files and are not big-data-friendly practices. Read from the data folder ~/mimic directly in following exercises.

Use Bash commands to answer following questions.

  1. Display the contents in the folders hosp and icu using Bash command ls -l. Why are these data files distributed as .csv.gz files instead of .csv (comma separated values) files? Read the page https://mimic.mit.edu/docs/iv/ to understand what’s in each folder.

Answer: The data files are distributed as .csv.gz files instead of .csv files because the .csv.gz files are compressed and take up less storage space. These compressed files can be transferred more quickly over networks and are easier to manage when it comes to storage and backup processes. The .csv.gz files are compressed using the gzip command.

The following bash command displays the contents in the folders hosp and icu. The hosp folder contains all data acquired from the hospital wide electronic health record. Information covered includes patient and admission information, laboratory measurements, microbiology, medication administration, and billed diagnoses.

ls -l ~/mimic/hosp/
total 8859752
-rw-rw-r--@ 1 zihengzhang  staff    15516088 Jan  5  2023 admissions.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff      427468 Jan  5  2023 d_hcpcs.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff      859438 Jan  5  2023 d_icd_diagnoses.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff      578517 Jan  5  2023 d_icd_procedures.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff       12900 Jan  5  2023 d_labitems.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff    25070720 Jan  5  2023 diagnoses_icd.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff     7426955 Jan  5  2023 drgcodes.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff   508524623 Jan  5  2023 emar.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff   471096030 Jan  5  2023 emar_detail.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff     1767138 Jan  5  2023 hcpcsevents.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff  1939088924 Jan  5  2023 labevents.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff    96698496 Jan  5  2023 microbiologyevents.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff    36124944 Jan  5  2023 omr.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff     2312631 Jan  5  2023 patients.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff   398753125 Jan  5  2023 pharmacy.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff   498505135 Jan  5  2023 poe.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff    25477219 Jan  5  2023 poe_detail.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff   458817415 Jan  5  2023 prescriptions.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff     6027067 Jan  5  2023 procedures_icd.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff      122507 Jan  5  2023 provider.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff     6781247 Jan  5  2023 services.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff    36158338 Jan  5  2023 transfers.csv.gz

The icu folder contains information collected from the clinical information system used within the ICU. Documented data includes intravenous administrations, ventilator settings, and other charted items.

ls -l ~/mimic/icu/
total 6155968
-rw-rw-r--@ 1 zihengzhang  staff       35893 Jan  5  2023 caregiver.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff  2467761053 Jan  5  2023 chartevents.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff       57476 Jan  5  2023 d_items.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff    45721062 Jan  5  2023 datetimeevents.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff     2614571 Jan  5  2023 icustays.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff   251962313 Jan  5  2023 ingredientevents.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff   324218488 Jan  5  2023 inputevents.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff    38747895 Jan  5  2023 outputevents.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff    20717852 Jan  5  2023 procedureevents.csv.gz
  1. Briefly describe what Bash commands zcat, zless, zmore, and zgrep do.

Answer:
The zcat command is used to display the contents of a compressed file without decompressing them.
The zless command is used to display the contents of a compressed file one page at a time, and if we scroll down or click down button, we can see the next line.
The zmore command is used to display the contents of a compressed file one page at a time, and if we scroll down or click down button, we can see the next page.
The zgrep command is used to search for a specified pattern in a compressed file.

  1. (Looping in Bash) What’s the output of the following bash script?
for datafile in ~/mimic/hosp/{a,l,pa}*.gz
do
  ls -l $datafile
done
-rw-rw-r--@ 1 zihengzhang  staff  15516088 Jan  5  2023 /Users/zihengzhang/mimic/hosp/admissions.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff  1939088924 Jan  5  2023 /Users/zihengzhang/mimic/hosp/labevents.csv.gz
-rw-rw-r--@ 1 zihengzhang  staff  2312631 Jan  5  2023 /Users/zihengzhang/mimic/hosp/patients.csv.gz

Answer: The output of the above bash script is the detailed information about each file with the specified pattern in the ~/mimic/hosp/ directory. The pattern {a,l,pa}*.gz means it will match files starting with “a”, “l”, or “pa” and ending with .gz.

Display the number of lines in each data file using a similar loop. (Hint: combine linux commands zcat < and wc -l.)

Answer: The following bash script displays the number of lines in each data file in the ~/mimic/hosp/ directory. admissions.csv.gz has 431232 lines. labevents.csv.gz has 118171368 lines. patients.csv.gz has 299713 lines. All three include the header line.

for datafile in ~/mimic/hosp/{a,l,pa}*.gz
do
  echo $datafile
  zcat < $datafile | wc -l
done
/Users/zihengzhang/mimic/hosp/admissions.csv.gz
  431232
/Users/zihengzhang/mimic/hosp/labevents.csv.gz
 118171368
/Users/zihengzhang/mimic/hosp/patients.csv.gz
  299713
  1. Display the first few lines of admissions.csv.gz. How many rows are in this data file? How many unique patients (identified by subject_id) are in this data file? Do they match the number of patients listed in the patients.csv.gz file? (Hint: combine Linux commands zcat <, head/tail, awk, sort, uniq, wc, and so on.)

Answer: The following bash script displays the first five lines of admissions.csv.gz. There are 431232 rows in this data file, including the header line. There are 180733 unique patients in this data file. It does not match the number of patients listed in the patients.csv.gz file, 299712.

zcat < ~/mimic/hosp/admissions.csv.gz | head -5
subject_id,hadm_id,admittime,dischtime,deathtime,admission_type,admit_provider_id,admission_location,discharge_location,insurance,language,marital_status,race,edregtime,edouttime,hospital_expire_flag
10000032,22595853,2180-05-06 22:23:00,2180-05-07 17:15:00,,URGENT,P874LG,TRANSFER FROM HOSPITAL,HOME,Other,ENGLISH,WIDOWED,WHITE,2180-05-06 19:17:00,2180-05-06 23:30:00,0
10000032,22841357,2180-06-26 18:27:00,2180-06-27 18:49:00,,EW EMER.,P09Q6Y,EMERGENCY ROOM,HOME,Medicaid,ENGLISH,WIDOWED,WHITE,2180-06-26 15:54:00,2180-06-26 21:31:00,0
10000032,25742920,2180-08-05 23:44:00,2180-08-07 17:50:00,,EW EMER.,P60CC5,EMERGENCY ROOM,HOSPICE,Medicaid,ENGLISH,WIDOWED,WHITE,2180-08-05 20:58:00,2180-08-06 01:44:00,0
10000032,29079034,2180-07-23 12:35:00,2180-07-25 17:55:00,,EW EMER.,P30KEH,EMERGENCY ROOM,HOME,Medicaid,ENGLISH,WIDOWED,WHITE,2180-07-23 05:54:00,2180-07-23 14:00:00,0
zcat < ~/mimic/hosp/admissions.csv.gz | awk -F, '{print $1}' | sort \
| uniq | sed '$d'| wc -l
  180733
  1. What are the possible values taken by each of the variable admission_type, admission_location, insurance, and race? Also report the count for each unique value of these variables. (Hint: combine Linux commands zcat, head/tail, awk, uniq -c, wc, and so on; skip the header line.)

Answer: The following bash script displays the possible values and the count for each unique value of the variable admission_type. There are 9 unique values for the variable admission_type, not including the header line. The counts for each unique value are as follows.

zcat < ~/mimic/hosp/admissions.csv.gz | awk -F, '{print $6}' | sort \
| uniq | sed '$d'|wc -l
zcat < ~/mimic/hosp/admissions.csv.gz | awk -F, '{print $6}' | sort \
| uniq -c | sed '$d'
       9
6626 AMBULATORY OBSERVATION
19554 DIRECT EMER.
18707 DIRECT OBSERVATION
10565 ELECTIVE
94776 EU OBSERVATION
149413 EW EMER.
52668 OBSERVATION ADMIT
34231 SURGICAL SAME DAY ADMISSION
44691 URGENT

Answer: The following bash script displays the possible values and the count for each unique value of the variable admission_location. There are 11 unique values for the variable admission_location, not including the header line. The counts for each unique value are as follows.

zcat < ~/mimic/hosp/admissions.csv.gz | awk -F, '{print $8}' | sort \
| uniq | sed '$d'|wc -l
zcat < ~/mimic/hosp/admissions.csv.gz | awk -F, '{print $8}' | sort \
| uniq -c | sed '$d'
      11
 185 AMBULATORY SURGERY TRANSFER
10008 CLINIC REFERRAL
232595 EMERGENCY ROOM
 359 INFORMATION NOT AVAILABLE
4205 INTERNAL TRANSFER TO OR FROM PSYCH
5479 PACU
114963 PHYSICIAN REFERRAL
7804 PROCEDURE SITE
35974 TRANSFER FROM HOSPITAL
3843 TRANSFER FROM SKILLED NURSING FACILITY
15816 WALK-IN/SELF REFERRAL

Answer: The following bash script displays the possible values and the count for each unique value of the variable insurance. There are 3 unique values for the variable insurance, not including the header line. The counts for each unique value are as follows.

zcat < ~/mimic/hosp/admissions.csv.gz | awk -F, '{print $10}' | sort \
| uniq | sed '$d'|wc -l
zcat < ~/mimic/hosp/admissions.csv.gz | awk -F, '{print $10}' | sort \
| uniq -c | sed '$d'
       3
41330 Medicaid
160560 Medicare
229341 Other

Answer: The following bash script displays the possible values and the count for each unique value of the variable race. There are 33 unique values for the variable race, not including the header line. The counts for each unique value are as follows.

zcat < ~/mimic/hosp/admissions.csv.gz | awk -F, '{print $13}' | sort \
| uniq | sed '$d'|wc -l
zcat < ~/mimic/hosp/admissions.csv.gz | awk -F, '{print $13}' | sort \
| uniq -c | sed '$d'
      33
 919 AMERICAN INDIAN/ALASKA NATIVE
6156 ASIAN
1198 ASIAN - ASIAN INDIAN
5587 ASIAN - CHINESE
 506 ASIAN - KOREAN
1446 ASIAN - SOUTH EAST ASIAN
2530 BLACK/AFRICAN
59959 BLACK/AFRICAN AMERICAN
4765 BLACK/CAPE VERDEAN
2704 BLACK/CARIBBEAN ISLAND
7754 HISPANIC OR LATINO
 437 HISPANIC/LATINO - CENTRAL AMERICAN
 639 HISPANIC/LATINO - COLUMBIAN
 500 HISPANIC/LATINO - CUBAN
4383 HISPANIC/LATINO - DOMINICAN
1330 HISPANIC/LATINO - GUATEMALAN
 536 HISPANIC/LATINO - HONDURAN
 665 HISPANIC/LATINO - MEXICAN
8076 HISPANIC/LATINO - PUERTO RICAN
 892 HISPANIC/LATINO - SALVADORAN
 560 MULTIPLE RACE/ETHNICITY
 386 NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER
15102 OTHER
1761 PATIENT DECLINED TO ANSWER
1510 PORTUGUESE
 505 SOUTH AMERICAN
1603 UNABLE TO OBTAIN
10668 UNKNOWN
272932 WHITE
1103 WHITE - BRAZILIAN
1170 WHITE - EASTERN EUROPEAN
7925 WHITE - OTHER EUROPEAN
5024 WHITE - RUSSIAN
  1. To compress, or not to compress. That’s the question. Let’s focus on the big data file labevents.csv.gz. Compare compressed gz file size to the uncompressed file size. Compare the run times of zcat < ~/mimic/labevents.csv.gz | wc -l versus wc -l labevents.csv. Discuss the trade off between storage and speed for big data files. (Hint: gzip -dk < FILENAME.gz > ./FILENAME. Remember to delete the large labevents.csv file after the exercise.)

Answer: The following bash script compares compressed gz file size to the uncompressed file size. The compressed gz file size is 1.8G. The uncompressed file size is 13G, which is much larger than the compressed gz file size.

gzip -dk < ~/mimic/hosp/labevents.csv.gz > ./labevents.csv
du -h ~/mimic/hosp/labevents.csv.gz
du -h labevents.csv 

Answer: The following bash script compares the run times of zcat < ~/mimic/labevents.csv.gz | wc -l versus wc -l labevents.csv. The run time of zcat < ~/mimic/labevents.csv.gz | wc -l is around 13.914s. The run time of wc -l labevents.csv is around 15.861s.

time zcat < ~/mimic/hosp/labevents.csv.gz | wc -l
time wc -l labevents.csv

Answer: In theory, the trade off between storage and speed for big data files is that compressed files take up less storage space but take longer to run. Uncompressed files take up more storage space but take less time to run. However, in my computer, the run time of zcat < ~/mimic/labevents.csv.gz | wc -l is shorter than the run time of wc -l labevents.csv.

Answer: Finally, delete the large labevents.csv file.

rm labevents.csv

Q5. More fun with Linux

Try following commands in Bash and interpret the results: cal, cal 2024, cal 9 1752 (anything unusual?), date, hostname, arch, uname -a, uptime, who am i, who, w, id, last | head, echo {con,pre}{sent,fer}{s,ed}, time sleep 5, history | tail.

cal
    January 2024      
Su Mo Tu We Th Fr Sa  
    1  2  3  4  5  6  
 7  8  9 10 11 12 13  
14 15 16 17 18 19 20  
21 22 23 24 _2_5 26 27  
28 29 30 31           
                      

Answer: The cal command displays a calendar for the current month.

cal 2024
                            2024
      January               February               March          
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  
    1  2  3  4  5  6               1  2  3                  1  2  
 7  8  9 10 11 12 13   4  5  6  7  8  9 10   3  4  5  6  7  8  9  
14 15 16 17 18 19 20  11 12 13 14 15 16 17  10 11 12 13 14 15 16  
21 22 23 24 _2_5 26 27  18 19 20 21 22 23 24  17 18 19 20 21 22 23  
28 29 30 31           25 26 27 28 29        24 25 26 27 28 29 30  
                                            31                    

       April                  May                   June          
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  
    1  2  3  4  5  6            1  2  3  4                     1  
 7  8  9 10 11 12 13   5  6  7  8  9 10 11   2  3  4  5  6  7  8  
14 15 16 17 18 19 20  12 13 14 15 16 17 18   9 10 11 12 13 14 15  
21 22 23 24 25 26 27  19 20 21 22 23 24 25  16 17 18 19 20 21 22  
28 29 30              26 27 28 29 30 31     23 24 25 26 27 28 29  
                                            30                    

        July                 August              September        
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  
    1  2  3  4  5  6               1  2  3   1  2  3  4  5  6  7  
 7  8  9 10 11 12 13   4  5  6  7  8  9 10   8  9 10 11 12 13 14  
14 15 16 17 18 19 20  11 12 13 14 15 16 17  15 16 17 18 19 20 21  
21 22 23 24 25 26 27  18 19 20 21 22 23 24  22 23 24 25 26 27 28  
28 29 30 31           25 26 27 28 29 30 31  29 30                 
                                                                  

      October               November              December        
Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  Su Mo Tu We Th Fr Sa  
       1  2  3  4  5                  1  2   1  2  3  4  5  6  7  
 6  7  8  9 10 11 12   3  4  5  6  7  8  9   8  9 10 11 12 13 14  
13 14 15 16 17 18 19  10 11 12 13 14 15 16  15 16 17 18 19 20 21  
20 21 22 23 24 25 26  17 18 19 20 21 22 23  22 23 24 25 26 27 28  
27 28 29 30 31        24 25 26 27 28 29 30  29 30 31              
                                                                  

Answer: The cal 2024 command displays a calendar for the year 2024.

cal 9 1752
   September 1752     
Su Mo Tu We Th Fr Sa  
       1  2 14 15 16  
17 18 19 20 21 22 23  
24 25 26 27 28 29 30  
                      
                      
                      

Answer: The cal 9 1752 command displays a calendar for the month of September in the year 1752. The calendar for the month of September in the year 1752 is unusual because the calendar for the month of September in the year 1752 is missing 11 days, from 3rd to 13 th. The calendar for the month of September in the year 1752 is missing 11 days because the British Empire and its colonies switched from the Julian calendar to the Gregorian calendar in September 1752. The Julian calendar was 11 days behind the Gregorian calendar. Therefore, the calendar for the month of September in the year 1752 is missing 11 days.

date
Thu Jan 25 18:09:46 PST 2024

Answer: The date command displays the current date and time.

hostname
zzhMac.local

Answer: The hostname command displays the name of the current host system. It is used to obtain the DNS (Domain Name System) name and set the system’s hostname or NIS (Network Information System) domain name.

arch
i386

Answer: The arch command displays the architectural information about the computer.

uname -a
Darwin zzhMac.local 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:27 PDT 2023; root:xnu-10002.41.9~6/RELEASE_X86_64 x86_64

Answer: The uname -a command displays the the name, version and other details about the operating system and the hardware. Paramater -a means reveal all the information.

uptime
18:09  up 16 days, 19:27, 1 user, load averages: 3.10 2.20 1.95

Answer: The uptime command displays the current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.

who am i
zihengzhang                   Jan 25 18:09 

Answer: The who am i command displays the login information of the current user.

who
zihengzhang      console      Jan  8 22:43 

Answer: The who command displays the login information of users who log in to the UNIX or Linux operating system.

w
18:09  up 16 days, 19:27, 1 user, load averages: 3.02 2.20 1.95
USER     TTY      FROM              LOGIN@  IDLE WHAT
zihengzhang console  -                08Jan24 16days -

Answer: The w command displays the login information of all users and what they are doing.

id
uid=501(zihengzhang) gid=20(staff) groups=20(staff),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh),400(com.apple.access_remote_ae),701(com.apple.sharepoint.group.1)

Answer: The id command displays the current user’s user and group IDs. “uid” is the user ID and it is a unique identification in the system. “gid” is the group ID and it is the a unique identification that represents the primary group to which the user belongs. “groups” is additional groups and it is a unique identification of other additional groups to which the user belongs.

last | head
zihengzhang ttys001                         Wed Jan 24 16:29 - 16:29  (00:00)
zihengzhang ttys001                         Wed Jan 24 16:23 - 16:23  (00:00)
zihengzhang ttys001                         Wed Jan 24 16:14 - 16:14  (00:00)
zihengzhang ttys001                         Wed Jan 24 16:02 - 16:02  (00:00)
zihengzhang ttys001                         Wed Jan 24 16:01 - 16:01  (00:00)
zihengzhang ttys001                         Wed Jan 24 16:00 - 16:00  (00:00)
zihengzhang ttys000                         Mon Jan 22 10:42 - 10:42  (00:00)
zihengzhang ttys001                         Sat Jan 20 20:48 - 20:48  (00:00)
zihengzhang ttys000                         Thu Jan 18 18:40 - 18:40  (00:00)
zihengzhang ttys000                         Thu Jan 18 18:24 - 18:24  (00:00)

Answer: The last | head command displays the last 10 users logged on.

echo {con,pre}{sent,fer}{s,ed}
consents consented confers confered presents presented prefers prefered

Answer: The echo {con,pre}{sent,fer}{s,ed} command gives us the combination of all the possible words. It selects the contents in curly brackets and groups them together. Each time, it selects one word from each curly bracket and combines them. Then, it displays all the possible combinations of the words in curly brackets.

time sleep 5

real    0m5.007s
user    0m0.001s
sys 0m0.002s

Answer: The time sleep 5 command displays the time it takes to run the command sleep 5, which is 5 seconds. sleep command is used to delay the next command execution in the script for a fixed amount of time.

history | tail

Answer: The history | tail command displays the last 10 commands that were run in the bash shell.

Q6. Book

  1. Git clone the repository https://github.com/christophergandrud/Rep-Res-Book for the book Reproducible Research with R and RStudio to your local machine.

  2. Open the project by clicking rep-res-3rd-edition.Rproj and compile the book by clicking Build Book in the Build panel of RStudio. (Hint: I was able to build git_book and epub_book but not pdf_book.)

The point of this exercise is (1) to get the book for free and (2) to see an example how a complicated project such as a book can be organized in a reproducible way.

For grading purpose, include a screenshot of Section 4.1.5 of the book here.

Answer: